Search CORE

29 research outputs found

Authorship Attribution: What\u27s Easy and What\u27s Hard?

Author: Argamon Ph.D, Shlomo
Koppel Ph.D., Moshe
Schler Ph.D., Jonathan
Publication venue: BrooklynWorks
Publication date: 01/01/2013
Field of study

Brooklyn Law School: BrooklynWorks

bepress Legal Repository

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

Author: Alzahrani Sultan
Bergsma Shane
Bethlehem Jelke G
Buolamwini Joy
Chen Xin
Ciot Morgane
Compton Ryan
Goot Rob
Goswami Sumit
Hecht Brent
Huang Gao
Jung Soon-Gyo
Kim Yoon
McCorriston James
Mislove Alan
Nguyen Dong
Nguyen Dong
Rosenthal Sara
Sap Maarten
Schler Jonathan
Zamal Faiyaz Al
Zhang Jinxue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Universaar

Acronym

The importance of neutral examples for learning sentiment

Author: Jonathan Schler
Jonathan Schler
Publication venue
Publication date: 01/01/2005
Field of study

Abstract. Most research on learning to identify sentiment ignores “neutral” examples, learning only from examples of significant (positive or negative) polarity. We show that it is crucial to use neutral examples in learning polarity for a variety of reasons. Learning from negative and positive examples alone will not permit accurate classification of neutral examples. Moreover, the use of neutral training examples in learning facilitates better distinction between positive and negative examples

CiteSeerX

Authorship Verification as a one-class classification problem

Author: Jonathan Schler
Publication venue: ACM Press
Publication date: 01/01/2004
Field of study

1 In the authorship verification problem, we are given examples of the writing of a single author and are asked to determine if given long texts were or were not written by this author. We present a new learning-based method for adducing the “depth of difference ” between two example sets and offer evidence that this method solves the authorship verification problem with very high accuracy. The underlying idea is to test the rate of degradation of the accuracy of learned models as the best features are iteratively dropped from the learning process. 1

CiteSeerX

Using neutral examples for learning polarity

Author: Jonathan Schler
Publication venue
Publication date
Field of study

Sentiment analysis is an example of polarity learning. Most research on learning to identify sentiment ignores “neutral ” examples and instead performs training and testing using only examples of significant polarity. We show that it is crucial to use neutral examples in learning polarity for a variety of reasons and show how neutral examples help us obtain superior classification results in two sentiment analysis test-beds. Many machine-learning problems involve predicting an example’s polarity: is it (significantly) greater than or less than some standard. One canonical example of learning polarity is sentiment analysis, the determination of whether a particular text expresses positive or negative sentiment regarding some issue. The problem of how to exploit a labeled corpus to learn models for sentiment analysis has attracted a good deal of interest in recent years [Dave et al 2003, Pang et al 2002, Shanahan et al 2005]. One common characteristic of almost all this work has been the tendency to define the task as a two-category problem: positive versus negative. In almost all actual polarity problems, including sentiment analysis, there are, however, three categories that must be distinguished: positive, negative and neutral. Not every comment on a product or experience expresses purely positive or negative sentiment. Some – in many cases, most – comments might report objective facts without expressing any sentiment, while others might express mixed or conflicting sentiment. Researchers are aware, of course, of the existence of neutral documents. The rationale for ignoring them has been a reliance on two tacit assumptions: • Solving the binary positive vs. negative problem automatically solves the three-category problem since neutral documents will simply lie near the boundary of the binary model • There is less to learn from neutral documents than from documents with clearly defined sentimen

CiteSeerX

The importance of neutral examples for learning sentiment

Author: Jonathan Schler
Publication venue
Publication date
Field of study

Most research on learning to identify sentiment ignores “neutral ” examples, learning only from examples of significant (positive or negative) polarity. We show that it is crucial to use neutral examples in learning polarity for a variety of reasons. Learning from negative and positive examples alone will not permit accurate classification of neutral examples. Moreover, the use of neutral training examples in learning facilitates better distinction between positive and negative examples.

CiteSeerX

Exploiting Stylistic Idiosyncrasies for Authorship Attribution

Author: Jonathan Schler
Moshe Koppel
Publication venue
Publication date: 01/01/2003
Field of study

Introduction Early researchers in authorship attribution used a variety of statistical methods to identify stylistic discriminators characteristics which remain approximately invariant within the works of a given author but which tend to vary from author to author (Holmes 1998, McEnery & Oakes 2000). In recent years machine learning methods have been applied to authorship attribution. A few examples include (Matthews & Merriam 1993, Holmes & Forsyth 1995, Stamatatos et al 2001, de Vel et al 2001). Both the earlier "stylometric" work and the more recent machine-learning work have tended to focus on initial sets of candidate discriminators which are fairly ubiquitous. For example, the classical work of Mosteller and Wallace (1964) on the Federalist Papers used a set of several hundred function words, that is, words that are context-independent and hence unlikely to be biased towards specific topics. Other features used in even earlier work (Yule 1938) are complexity-bas

CiteSeerX